Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/elasticsearch] add dynamic document pipeline support for logs #37860

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

leehinman
Copy link

Description

This PR adds a new config option logs_dynamic_pipeline that when set to
true reads the elasticsearch.document_pipeline attribute from each log
record and uses it as the ingest pipeline in Elasticsearch. This is
only implemented for logs, but a subsequent PR supporting metrics and
traces could be opened.

Link to tracking issue

Fixes #37419

Testing

Added tests to verify that the document pipeline attribute can be read
from the log record and that the pipeline is properly forwarded to
Elasticsearch. Also asserted that when there is no document pipeline
attribute the current behavior is retained.

Manual testing that setting the option resulted in the named ingest
pipeline being run on the document in Elasticsearch.

Documentation

Updated the readme to mention the new logs_dynamic_pipeline config option.

@leehinman leehinman requested a review from a team as a code owner February 11, 2025 20:40
Copy link

linux-foundation-easycla bot commented Feb 11, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

Copy link
Contributor

@carsonip carsonip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, code lgtm, just a nit on readme.

@@ -145,6 +145,9 @@ This can be customised through the following settings:
- `logs_dynamic_id` (optional): Dynamically determines the document ID to be used in Elasticsearch based on a log record attribute.
- `enabled`(default=false): Enable/Disable dynamic ID for log records. If `elasticsearch.document_id` exists and is not an empty string in the log record attributes, it will be used as the document ID. Otherwise, the document ID will be generated by Elasticsearch. The attribute `elasticsearch.document_id` is removed from the final document. See [Setting a document id dynamically](#setting-a-document-id-dynamically).

- `logs_dynamic_pipeline` (optional): Dynamically determines the ingest pipeline to be used in Elasticsearch based on a log record attribute.
- `enabled`(default=false): Enable/Disable dynamic pipeline for log records. If `elasticsearch.document_pipeline` exists and is not an empty string in the log record attributes, it will be used as the Elasticsearch ingest pipeline. The attribute `elasticsearch.document_pipeline` is removed from the final document.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attribute elasticsearch.document_pipeline is removed from the final document.

nit: While this is true for OTel mode, it doesn't seem to be the case for other modes. I'm fine with that, but we should state that in the documentation. (I think I either missed this for elasticsearch.document_id in its PR review or something has changed since then)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated both.

- Added caveat that the attributes are only removed when `otel`
mapping mode is used
@leehinman leehinman force-pushed the 37419_elasticsearch_pipeline_id branch from 642ff7f to 8ef035c Compare February 13, 2025 19:15
@VihasMakwana
Copy link
Contributor

@open-telemetry/collector-contrib-maintainers can someone please approve the workflow?

@jpkrohling jpkrohling changed the title add dynamic document pipeline support for logs [exporter/elasticsearch] add dynamic document pipeline support for logs Feb 18, 2025
Copy link
Member

@andrzej-stencel andrzej-stencel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, added a question and a docs suggestion.

@@ -188,6 +191,7 @@ Documents may be optionally passed through an [Elasticsearch Ingest pipeline] pr
This can be configured through the following settings:

- `pipeline` (optional): ID of an [Elasticsearch Ingest pipeline] used for processing documents published by the exporter.
- If `elasticsearch.document_pipeline` exists and is not an empty string in the log record attributes, then that pipeline will be used for that log record.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this mention that this works when the logs_dynamic_pipeline::enabled property is set to true?

- `enabled`(default=false): Enable/Disable dynamic ID for log records. If `elasticsearch.document_id` exists and is not an empty string in the log record attributes, it will be used as the document ID. Otherwise, the document ID will be generated by Elasticsearch. The attribute `elasticsearch.document_id` is removed from the final document. See [Setting a document id dynamically](#setting-a-document-id-dynamically).
- `enabled`(default=false): Enable/Disable dynamic ID for log records. If `elasticsearch.document_id` exists and is not an empty string in the log record attributes, it will be used as the document ID. Otherwise, the document ID will be generated by Elasticsearch. The attribute `elasticsearch.document_id` is removed from the final document when the `otel` mapping mode is used. See [Setting a document id dynamically](#setting-a-document-id-dynamically).

- `logs_dynamic_pipeline` (optional): Dynamically determines the ingest pipeline to be used in Elasticsearch based on a log record attribute.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] I wonder why this is a logs-specific setting. Would it make sense to make this available to other signals as well - metrics, traces, maybe profiles?

Copy link
Contributor

@VihasMakwana VihasMakwana Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. We can make it available for all of them. I don't think this is much of a work.

On a side note, I think we should do same for dynamic document id setting which was recently introduced. It is only supported for logs for now. But I think we can make it available for all signals (in another PR ofcourse)

Copy link
Contributor

@VihasMakwana VihasMakwana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[exporter/elasticsearch] enhancement request to support specifying pipeline in event
5 participants